Tag

#AI efficiency

10 articles

Kimi K3 vs DeepSeek V4 Pro vs GLM-5.2: Open Trillion-Scale MoE Models Compared on Benchmarks, License, and Serving Cost

This article explains Mixture of Experts (MoE) AI models, how they work like teams of specialists, and why they're important for efficient AI performance.

Jul 1813

Baidu's "Unlimited OCR" processes dozens of document pages in one pass by treating memory like human forgetting

Learn how Baidu's Unlimited OCR achieves efficient processing of dozens of document pages in a single pass by mimicking human memory and forgetting mechanisms.

Jul 540

Databricks’ former AI chief thinks he can cut AI’s power bill by 1,000x

This explainer explores how Un0, a new AI system from Databricks' former AI chief, achieves 1,000x energy efficiency improvements through novel architectural approaches that could revolutionize AI development and deployment.

Jun 2531

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

Google DeepMind introduces DiffusionGemma, a 26 billion parameter open model using text diffusion to achieve up to 4x faster text generation on GPUs.

Jun 1033

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

Learn how a new AI system enables faster and more efficient continual learning by running multiple experiments at once using LoRA adapters.

May 3068

UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size

Learn how Parcae, a new AI architecture, helps language models become more efficient and powerful without needing to be twice the size. Understand how this breakthrough could make AI more sustainable and accessible.

Apr 1592

Google's Veo 3.1 Lite cuts video generation costs by more than half

Google's Veo 3.1 Lite cuts video generation costs by more than half while maintaining speed and performance, making AI video creation more accessible.

Mar 31102

Alibaba just released Qwen 3.5 Small models: a family of 0.8B to 9B parameters built for on-device applications

This article explains Alibaba's Qwen 3.5 Small Model Series, a new approach to AI model design that emphasizes efficiency and on-device deployment over traditional large-scale parameter increases.

Mar 2156

DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

Learn about SPCT (Sparse Prompt Compression Technique), a new method developed by DeepSeek AI that improves the scalability of reward models during inference, making AI systems more efficient and cost-effective.

Feb 27157

RAG vs. Context Stuffing: Why selective retrieval is more efficient and reliable than dumping all data into the prompt

As language models gain the ability to process massive context windows, experts argue that selective retrieval methods like RAG remain more efficient and reliable than simply dumping all data into prompts.

Feb 23100